Database normalization is the process of structuring a relational database to minimize redundancy and dependency by organizing fields and table relations.
The normal forms are:
1NF ensures that all attributes contain only atomic values and each column contains values of a single type.
CREATE TABLE Student ( StudentID INT PRIMARY KEY, Name VARCHAR(100), Subjects VARCHAR(255) -- Not in 1NF if multiple subjects stored in one column );
2NF removes partial dependencies, ensuring that non-key attributes are fully functionally dependent on the primary key.
Example of a table violating 2NF:
CREATE TABLE Orders ( OrderID INT, ProductID INT, ProductName VARCHAR(100), PRIMARY KEY (OrderID, ProductID) );
To correct it, split into two tables:
CREATE TABLE Orders ( OrderID INT PRIMARY KEY ); CREATE TABLE OrderDetails ( OrderID INT, ProductID INT, ProductName VARCHAR(100), PRIMARY KEY (OrderID, ProductID), FOREIGN KEY (OrderID) REFERENCES Orders(OrderID) );
3NF eliminates transitive dependencies, meaning that non-key attributes should not depend on other non-key attributes.
BCNF is a stricter version of 3NF where every determinant is a candidate key.
Normalization reduces redundancy, improving data integrity, but may require more joins, potentially impacting query performance.
Denormalization is useful when read performance is critical, reducing joins at the cost of redundancy.
import java.sql.Connection; import java.sql.DriverManager; import java.sql.PreparedStatement; import java.sql.ResultSet; public class DatabaseQueryExample { public static void main(String[] args) { try { Connection conn = DriverManager.getConnection("jdbc:mysql://localhost:3306/school", "root", "password"); PreparedStatement stmt = conn.prepareStatement("SELECT * FROM Students WHERE StudentID = ?"); stmt.setInt(1, 1); ResultSet rs = stmt.executeQuery(); while (rs.next()) { System.out.println(rs.getString("Name")); } conn.close(); } catch (Exception e) { e.printStackTrace(); } } }
Normalization primarily aims to reduce redundancy and improve data integrity.
No, a table must be in 1NF before it can be considered for 2NF.
Many-to-many relationships are handled using a junction table.
CREATE TABLE StudentCourse ( StudentID INT, CourseID INT, PRIMARY KEY (StudentID, CourseID), FOREIGN KEY (StudentID) REFERENCES Student(StudentID), FOREIGN KEY (CourseID) REFERENCES Course(CourseID) );
BCNF is stricter than 3NF as it requires that every determinant is a candidate key.
BCNF can lead to complex schemas with more tables, increasing the number of joins required for queries.
A table where a non-primary attribute functionally determines another key violates BCNF.
4NF eliminates multi-valued dependencies to prevent redundancy.
It ensures that each multi-valued attribute is stored in a separate table.
A multi-valued dependency occurs when one attribute depends on another, independent of other attributes.
@Entity @Table(name = "students") public class Student { @Id @GeneratedValue(strategy = GenerationType.IDENTITY) private Long id; private String name; @OneToMany(mappedBy = "student") private Listenrollments; }
5NF deals with join dependencies and ensures that the decomposition of relations does not result in data loss.
5NF helps in removing redundancy caused by join dependencies that are not implied by candidate keys.
DKNF ensures that all constraints and dependencies are enforced by domain and key constraints.
Denormalization is the process of merging tables to optimize query performance at the cost of redundancy.
Denormalization should be used when database performance is more critical than eliminating redundancy.
CREATE TABLE Orders ( OrderID INT PRIMARY KEY, CustomerName VARCHAR(255), ProductName VARCHAR(255), Quantity INT );
Indexes speed up searches and retrieval by providing quick lookup mechanisms.
CREATE INDEX idx_customer_name ON Customers(CustomerName);
A foreign key constraint ensures referential integrity between two tables.
@Entity @Table(name = "orders") public class Order { @Id @GeneratedValue(strategy = GenerationType.IDENTITY) private Long id; @ManyToOne @JoinColumn(name = "customer_id", nullable = false) private Customer customer; }
Star schema has denormalized dimensions, while snowflake schema normalizes dimensions to reduce redundancy.
Star schema offers simpler queries and faster retrieval due to fewer joins.
A surrogate key is a unique, system-generated key used as a primary key instead of a natural key.
Normalization increases the number of tables, requiring careful indexing strategies to maintain performance.
A composite key is a primary key that consists of two or more columns.
CREATE TABLE Enrollment ( StudentID INT, CourseID INT, PRIMARY KEY (StudentID, CourseID) );
A unique constraint ensures that all values in a column or set of columns are distinct.
CREATE TABLE Employees ( EmployeeID INT PRIMARY KEY, Email VARCHAR(255) UNIQUE );
Normalization tools help automate database schema optimization to achieve normalization levels.
@Embeddable public class EnrollmentId implements Serializable { private Long studentId; private Long courseId; } @Entity @Table(name = "enrollment") public class Enrollment { @EmbeddedId private EnrollmentId id; }
A natural key is a key that has a real-world meaning, such as a Social Security Number or email address.
A candidate key is a column, or set of columns, that can uniquely identify a record in a table.
A primary key uniquely identifies records in its own table, while a foreign key references a primary key in another table.
CREATE TABLE Orders ( OrderID INT PRIMARY KEY, CustomerID INT, FOREIGN KEY (CustomerID) REFERENCES Customers(CustomerID) );
A transitive dependency occurs when a non-key attribute depends on another non-key attribute, violating 3NF.
BCNF is a stricter form of 3NF that removes any remaining functional dependencies where non-key attributes determine part of the key.
Denormalization reduces joins and improves performance by reintroducing redundancy into a normalized schema.
CREATE TABLE Orders ( OrderID INT PRIMARY KEY, CustomerName VARCHAR(255), ProductName VARCHAR(255) );
4NF eliminates multi-valued dependencies, ensuring that each record represents a single fact.
@Entity public class Order { @Id @GeneratedValue(strategy = GenerationType.IDENTITY) private Long orderId; @ManyToOne @JoinColumn(name = "customer_id") private Customer customer; }
Normalization reduces redundancy and ensures data integrity, while denormalization improves read performance by reducing joins.
5NF ensures that all join dependencies are lossless and eliminates redundancy due to multi-join relationships.
A surrogate key is an artificially generated unique identifier, such as an auto-incremented ID.
CREATE TABLE Customers ( CustomerID SERIAL PRIMARY KEY, Name VARCHAR(255) NOT NULL );
DKNF ensures that every constraint on the database is a logical consequence of domain and key constraints.
Indexes improve query performance, while normalization optimizes storage and data integrity.
@Embeddable public class OrderKey implements Serializable { private Long orderId; private Long productId; } @Entity public class Order { @EmbeddedId private OrderKey id; }
A functional dependency occurs when one attribute uniquely determines another attribute in a relation.
CREATE INDEX idx_customer_name ON Customers(Name);
A clustered index determines the physical order of data in a table, while a non-clustered index maintains a separate structure for lookups.
A transitive dependency exists when a non-key attribute depends on another non-key attribute instead of the primary key.
CREATE TABLE Employees ( EmployeeID INT PRIMARY KEY, DepartmentID INT, DepartmentName VARCHAR(255) ); -- Normalize by creating a separate Department table CREATE TABLE Departments ( DepartmentID INT PRIMARY KEY, DepartmentName VARCHAR(255) );
2NF removes partial dependencies, while 3NF removes transitive dependencies.
BCNF ensures that every determinant is a candidate key, further reducing redundancy.
@Entity public class Employee { @Id @GeneratedValue private Long id; @ManyToOne @JoinColumn(name = "department_id") private Department department; } @Entity public class Department { @Id @GeneratedValue private Long id; private String name; }
A superkey is a set of attributes that uniquely identifies a tuple in a relation.
A candidate key is a minimal superkey that contains no unnecessary attributes.
CREATE TABLE OrderDetails ( OrderID INT, ProductID INT, Quantity INT, PRIMARY KEY (OrderID, ProductID) );
6NF deals with temporal databases by eliminating non-trivial join dependencies.
Normalization improves data integrity but may require additional joins, potentially impacting query performance.
4NF eliminates multi-valued dependencies to prevent redundancy in multi-valued fields.
CREATE TABLE EmployeeSkills ( EmployeeID INT, Skill VARCHAR(255), Certification VARCHAR(255) );
By splitting multi-valued attributes into separate tables.
CREATE TABLE EmployeeSkills ( EmployeeID INT, Skill VARCHAR(255), PRIMARY KEY (EmployeeID, Skill) ); CREATE TABLE EmployeeCertifications ( EmployeeID INT, Certification VARCHAR(255), PRIMARY KEY (EmployeeID, Certification) );
5NF eliminates join dependencies by ensuring all data is represented without redundant decomposition.
@Entity public class EmployeeSkill { @Id @GeneratedValue private Long id; private String skill; @ManyToOne @JoinColumn(name = "employee_id") private Employee employee; }
It occurs when one attribute determines multiple independent attributes.
It removes redundancy from complex join dependencies.
CREATE TABLE Projects ( EmployeeID INT, ProjectID INT, Role VARCHAR(255), PRIMARY KEY (EmployeeID, ProjectID, Role) );
By decomposing the table into smaller tables that eliminate redundancy.
Because most real-world databases do not require this level of normalization due to performance concerns.
6NF deals with temporal data and ensures that no redundancy exists by breaking down data into the smallest possible parts.
CREATE TABLE EmployeeHistory ( EmployeeID INT, StartDate DATE, EndDate DATE, Department VARCHAR(255), PRIMARY KEY (EmployeeID, StartDate) );
By ensuring that data changes over time are tracked precisely without redundancy.
5NF eliminates join dependencies, while 6NF focuses on temporal dependencies.
@Entity public class EmployeeHistory { @Id @GeneratedValue private Long id; private LocalDate startDate; private LocalDate endDate; private String department; }
6NF can lead to a high number of joins and may impact query performance negatively.
Indexes help optimize search queries by speeding up lookups in highly decomposed tables.
CREATE INDEX idx_employee_history ON EmployeeHistory (EmployeeID, StartDate);
Normalization is not recommended when performance is a priority over data integrity, such as in read-heavy applications.
Denormalization is the process of combining tables to reduce joins and improve read performance at the cost of redundancy.